Modern Time Series Forecasting with Python
Packages & Datasets

Marco Zanotti

Packages

The course is focused on Python toolkit organized by role.

For data wrangling we use pandas and polars, while pytimetk streamlines time-aware feature engineering, visualization, and preprocessing with Plotly for interactive charts.

For modeling we rely on scikit-learn for pipelines and preprocessing, and the Nixtla ecosystem for forecasting:

  • statsforecast for classical/statistical models and efficient forecasting primitives
  • mlforecast for machine learning models (linear, tree-based) with exogenous features
  • neuralforecast for deep learning models
  • utilsforecast and coreforecast for backtesting utilities, evaluation, and feature generators

For hosted models, Nixtla’s TimeGPT is accessed via the nixtla Python client (API key required). Agents are explored via the timecopilot package.

All required packages are managed with conda using the provided environment file:

conda env create -f src/env-setup/conda_env_setup.yml
conda activate modern_tsf

Datasets

Email Subscribers

A company decided to change the selling process of its products converting from a completely physical store approach, to a more digital and modern solution. Hence, it decided to open an online web store that integrates an e-commerce platform, where its “virtual” customers can by all the merchandise.
In order to monitor this new business solution, it adopted few well-known data analytics tools.

Google Analytics has been set up on the web store pages to collect data related to page views, sessions and organic searches. This could potentially help the company to understand whether its website is gaining popularity.

Moreover, MailChimp is used to track all the customers that buy a product and subscribe to the web store.

Finally, marketing events like discount programs and new product launch are promoted through several social network channels.

All these data are stored into the company database and can be used to analyze the factors that impacts on the web store sales.

M4 Competition Hourly

The M4 Competition is a well-known time series forecasting competition organized by Spyros Makridakis. The competition provides a large dataset of time series from various domains, including finance, economics, and demographics. The goal of the competition is to develop accurate forecasting models for these time series.

https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/

We will use a sample of the M4 Hourly dataset, which consists of hourly time series data. The dataset contains multiple time series, each identified by a unique ID.